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Introduction 


For the first quarter of this research 


contract, we axe going to report progress on the 


following four Tasks (as described In the contract): 


Fuzzy set-based decision making methodologies; 
Feature Calculation; 

Clustering for curve and surface fitting 
Acquisition of images. 



Fuzzy set-based de cision making methodoloyips 


In this section, we describe the genera] structure for networks based on fuzzy set 
connectives which we are using for information fusion and decision making in Space 
Applications. We describe the structure and training techniques for such networks 
consisting of generalized means and y-operators. We arc currently examining the use of 
other hybrid operators in multicriteria decision making. 

In complex computer vision systems, several sources of information (such as multi- 
spectral color sensors, range sensors, stereo views, different algorithms, multiple expert 
systems) are commonly employed in order to reduce the uncertainty and to resolve the 
ambiguity present in the information derived from a single information source (such as an 
intensity image). The advantages of multi-source fusion lie in redundancy, 
complementarity, timeliness and cost of the information . Thus, there is a need for 
methodologies that can aggregate inexact and incomplete information obtained from 
multiple sources in order to make decisions. The decisions may be of various types. For 
example, in segmentation based on region growing, one needs to decide if a homogeneity 
criterion is satisfied; in edge-based segmentation one needs to decide whether an edge is 
present or not; in object recognition, one needs to assign a class label to each object. 

One can also formulate this problem as a multi-criteria decision making problem as 
follows. The support for a decision may depend on supports for (or degrees of satisfaction 
of) several different criteria, and the degree of satisfaction of each criterion may in turn 
depend on degrees of satisfaction of other sub-criteria, and so on. Thus, the decision 
process can be viewed as a hierarchical network, where each node in the network 
"aggregates" the degree of satisfaction of a particular criterion from the observed support. 
The inputs to each node are the degrees of satisfaction of each of the sub-criteria, and the 



output is the aggregated degree of satisfaction of the criterion. Thus, the decision making 
problem reduces to i) determining the structure of the network to be used, ii) the nature of 
the connectives at each node of the network, and iii) computing the input supports (degrees 
of satisfaction of criteria) based on observed features. 

Fuzzy Aggregation Connectives 


Fuzzy set theory provides a host of very attractive aggregation connectives for 
integrating membership functions representing uncertain and subjective information . These 
connectives can be categorized into the following three classes based on their aggregation 
behavior: i) union connectives, ii) intersection connectives, and iii) compensative 
connectives. Compensative connectives can be further classified into mean operators and 
hybrid operators. In addition to these, there are also other types of operators such as the 
OWA operators proposed by Yager, which are capable of modeling linguistic quantifiers 
such as at least and "at most". These will not be discussed in this report. 

The Union Connective 

The union connective has the property that the aggregated value is high whenever 
any one of the input values representing different features or criteria is high. The most 
popular union operator is the "max" operator. However, the max operator is the most 
pessimistic of all union operators. If we want to be more optimistic, we need to consider 
one of the many generalizations of the max operator. One such operator is the union 
operator defined by Yager , and is given by 

u(x i,x 2 ,...,x n ) = min(l,(X]/ > +X2^+...+x„P) 1 /P). (1) 

It can be shown that the range of this operator is between max(jq,r2,...,x„) and 1 , and by 
varying the value of p, we can achieve the required degree of optimism. 



The Intersection Connective 


The intersection connective has the property that the aggregated value is high only 
when all of the inputs are high. Several fuzzy intersection operators can be defined, 
depending on the conditions that we would like the intersection to sadsfy. The "min" 
operator is by far the most popular intersection operator. However, the min operator is the 
most optimistic of all intersection operators. To allow for different degrees of pessimism, 
one could choose any of the generalizations to the min operator. For example, the 
intersection operator due to Yager is given by 

i(x\,X 2 ,...,x n ) = 1- min[ 1 .((l-xj )~P+( 1 -X 2 )~P+...+( 1 -x n yP)~ 1/P)] (2) 

Varying the value oip between 0 and -<», we can achieve various degrees of pessimism. 

Connectives with Compensatory Behavior 

In many decision-making situations one is likely to take a position between the two 
extremes of no compensation characterized by the intersection operators and of full 
compensation characterized by the union operators. In applications such as multifactorial 
evaluation (decision making based on several criteria), a certain amount of compensation is 
desirable. In other words, one might be willing to sacrifice a little on one factor, provided 
the loss is compensated by gain in another factor. For example, intensity and range 
information may be mutually compensatory in some situations. Several compensative 
operators have been proposed in the literature. These can be classified into two groups 
depending to their origins: mean operators and hybrid operators. Mean operators are 
defined through an axiomatic approach. Hybrid operators are defined as the weighted 
arithmetic or geometric mean of a pair of conventional union and intersection operators. 


Mean Operators and the Generalized Mean 

As pointed out in, the mean operators are very effective in decision making when 
the criteria are mutually compensable in nature. A mean operator m is a mapping m: [0,1] x 
[0,1] -» [0,1] such that 

i. m(a,b) > m(c,d) if a >c and b > d {monotonicity} 

ii. min(a,b) < m(a,b ) < ma x(a,b). 

Among the mean operators that satisfy the above properties are the weighted arithmetic 
mean and the geometric mean. Another effective mean operator is the generalized mean first 
proposed by Dujmovic and later by Dyckhoff and Pedrycz. It is defined by 


f * 


l/ P 


g(x Y x 2 ,...,x n ,p J w y w 2 ,...,w n ) = • 


( 3 ) 


The w/'s can be thought of as the relative importance factors for the different criteria where 
wi+w2+...+w n = 1 - ( 4 ) 

The generalized mean has several attractive properties. For example, the mean value always 
increases with an increase in p . Thus, by varying the value of p between -«> and +°°, we 

can obtain all values between min and max. Therefore, in the extreme cases, this operator 
can be used as union or intersection. Also, it can be shown that p =- 1 gives the harmonic 
mean, p = 0 gives the geometric mean, and />= 1 gives the arithmetic mean. We have 
suggested that one can also use the generalized mean to simulate linguistic concepts such 
as "at least" and "at most" by choosing appropriate values for the parameters. 



Hybrid Connectives and the y-Model 

The y-model devised by Zimmermann and Zysno is an example of hybrid 
operators, and it is defined by 

1 ? " S Y 

1 * XT (1 ' *,•) , where V 5. = n and 0 < y< 1 (5) 

‘ = i J i = i ‘ 

In (5), the e [0,1] are the n inputs to be aggregated, 5 ,• represents the weight associated 

with x[, and yis a parameter that controls the degree of compensation between the union 
and intersection parts. The dependence of y on the .r t - and the 5j has been omitted for 

convenience of notation. The y-model has been observed to provide a close match to human 
decision makers. 

The ymodel has some very attractive properties. It is a monotonically increasing 
function with respect to xi and y, and hence (rmm)" < y < 1 - (1 - x max ) n . where 
•*min= min(xi, . . , x n ) and x max = max(x; , . . . x n ). It is to be noted that these limits 
correspond to the "algebraic product" and the "algebraic sum" respectively. Since n> 2, 
this property shows that the y-model can behave both as a union operator and an 
intersection operator in addition to being a compensatory’ operator, and its range will suffice 
for many applications. 




Learning the Structure and Parameters of Networks 

Although two-layer networks (with one level aggregation functions) perform well 
in simple situations, in more complex settings it becomes necessary to use a multi-layer 
aggregation scheme. The aggregation and propagation of degrees of satisfaction of criteria 
in hierarchical networks is not a difficult problem if the structure of the hierarchy is known 
and if the type of connective to be used at each node in the hierarchy is known. Sometimes 
this is the case. For example, in medical applications, the hierarchy of the symptoms and 
the diagnoses are fairly well known. However, in most situations we may have only an 
approximate idea of the structure of the hierarchy, the nature of the connective associated 
with each node, and the relevant criteria (features) to be used. We show that optimization 
procedures such as the gradient descent and the backpropagation algorithm can be used to 
determine the proper type of aggregation connective at each node and its parameters, given 
only an approximate structure of the network and given a set of training data that describe 
the desired behavior of the aggregation network in terms of inputs at the bottom-most level 
and the outputs at the top-most level. 

In a particular situation, the type of aggregation function to be used depends on the 
(conjunctive, disjunctive or compensative) nature of the problem as well as the desired 
(pessimistic or optimistic) attitude. In a previous paper, we described a method to 
determine the nature and parameter values of the aggregation functions in a hierarchical 
network when a mixture of all three classes of connectives are desirable. However, in this 
report, we confine ourselves to networks that are entirely made up of either the generalized 
mean or the y-model. We believe that the flexibility and range of these two connectives 
suffices for the type of applications we wish to consider. We now briefly describe the 
learning procedure for these two cases. 



Let us assume that there are n inputs to the node, and the training data for this node 

consists of N sets of inputs x\k, . . pc n k with N corresponding desired outputs T* (where 

k=l, . . ,N, and k denotes the set number). Each set represents a known situation (i.e., the 

degrees of satisfaction of criteria and the corresponding decision made in that case). The 

problem is to determine the best type of aggregation function and its parameters for this 

node in such a way that the discrepancy between the desired and actual behavior is 

minimized. One measure that is commonly used as discrepancy is the sum of squared 

errors defined by 
N 

E ‘l ( 6) 

k- 1 

In the above equation,//; is the aggregation function evaluated at f.ri*. . . j (nk )• 

Learning Using the Generalized Mean 

In this case, From (3) and (4) we see that the/* can be written as 
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( 7 ) 


fk is written in this form so that the w/ can be chosen free of the constraints in (4). We 
initially set p and w/ to 1. One can also choose them randomly. Then, we update the 
weights using the following equations based on gradient descent. It is to be noted that the 
generalized mean is well-behaved everywhere except atp=0 where the derivative is infinity. 
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( 8 ) 
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where rj and r\’ are suitable positive constants and 


(9) 



This process is repeated until there is no change in w( and p. This happens when dE/dwi = 0 
and dE/dp = 0, i.e., when a minimum of E is reached. We have shown that the solution is 
unique under practical conditions, and hence the gradient descent procedure should 
converge to the global minimum. The choice of tj and 77' is very important and it determines 
the speed and reliability of convergence. Since we start with a mean aggregation function, 
if the training data is better described by a union (intersection) operator, then the value of p 



will keep increasing (decreasing) and will not converge (i. e., will converge at ±°°). 
However, the procedure presented here can be modified to deal with such situations . 

Learning Using the /-Model 

The 7-model can behave like a union operator or an intersection operator or a 
compensation operator, depending on the value of 7. Since the 7model is continuous and 
differentiable with respect to 7 and <5j\ we can again use gradient descent methods to arrive 
at the values of /and S( that best match the given inputs and the corresponding desired 
outputs. To eliminate the constraints on 7 and 8 [ in ( 5 ), we first modify the definition of 7 
and Si as follows. 


a n ^: 

and 8 = — 

9 2 1 m ~ 

° 14 

k = 1 


( 12 ) 


In ( 12 ), we can choose a, b and without any constraints and still satisfy the constraints 
on 7 and S-. It is easily verified that 
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where 


8; 


y,=rK ; = 1 - n a - as) 

i = 1 i = 1 

Using the above partial derivatives, we can update the values of /and <5/ to minimize the 
discrepancy that reflects the error between desired values of aggregation Yk and computed 
values fk =f(x\k, ■ - Xnk\ j4\ • • 4m)- We first update a, b, and d using 
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(17) 



( 18 ) 



The partial derivatives in (16)-(18) are given in (13) and (14). From a new , b new and 
dj new , we can update the new values of 7 and Sj using ( 12 ). 

This training procedure can be extended to a general situation where there are 
several nodes arranged in a hierarchical network. In this case, the training data normally 
consists of input values at the bottom-most layer and the desired outputs at the top-most 
layer. The extension can be done by using the backpropagation algorithm 

The time required for convergence of this algorithm tends to be very large if it is 
implemented using the simple backpropagation technique. There are several ways to 
improve the speed of convergence. Also, a common disadvantage of all gradient descent 
methods is that they may get trapped in a local minimum. However, we would like to note 
that our training scheme does not necessarily have to use gradient descent methods. We 
intend to use other optimization techniques such as the random search method or the 
differential equation method to overcome the problems mentioned above. 

Our goal in subsequent quarters is to design appropriate hierarchical decision 
networks for segmentation, recognition, and pose estimation of Space Objects. These 
networks will be fed by membership values generated from relevant features and outputs of 
low level vision algorithms run on the images. They will be trained on sample images, and 
tested with “unknown” data. 



Feature Calculation 


Since we are in the early stages of collecting digital images of Space Objects, it is 
somewhat premature to discuss the features which will be used in the aggregation networks 
for object recognition and pose estimation. We have implemented numerous classical 

features on image regions such as: 

Gray level statistics; 

Edge and curve primitives; 

Texture measures from the cooccurance matrix; 

Size and Shape parameters. 

In addition, we have pioneered the use of several fractal geometric features which 
may have a considerable impact on characterizing “cluttered” background, such as clouds, 
dense star patterns, or some planetary surfaces. Should range imagery become available, 
we have also introduced several features (and algorithms) which utilize differential 
geometric models. 

As a natural result of using fuzzy clustering algorithms, we will be able to derive 
experimental measures of the “goodness” of a particular feature set toward the problem at 
hand. These experiments will be described in detail in a future report. 


Clustering for Curve and Surface Fitting 


The best way to describe the new work in this task is to include a copy of a 
manuscript recently submitted by Dr. Krishnapuram and two of the graduate students 
supported by this contract to the IEEE Transactions on Neural Networks 

The tide of the paper is: 

“The Fuzzy C-Shells Algorithm: A New Approach”. 

We are currently extending this work to clustering edge data into general quadratic 
curves, as well as extending this approach to 3-Dimensional data sets( ie, surfaces). 



Acnuisition of Images 


In June, the PI traveled to Houston (in conjunction with a joint trip with Bob Lea to 
the MCC Fuzzy Systems Conference in Austin). While at NASA, meetings were held with 
Drs. Lea, Pal, and Cleghom about the availability and type of imagery to be used in the 
project. This group also visited Dr. Richard Juday to discuss possible collaboration or at 

least a sharing of data. 

NASA personnel are in the process of acquiring suitable simulation data and 
hopefully videotaped actual shuttle imagery. We have the capability in the Computer Vision 
Lab. at MU to digitize directly from a standard VCR. While we are waiting for this real (or 
simulated real) imagery, we have been digitizing photographs to use in our algorithms. 
Also, we have assembled a model of the shuttle, and are constructing a mechanism to orient 
this model in 3-D to digitize for experiments on pose estimation. Absolute perfection of 
details for this work is less important than the knowledge of the actual pose parameters to 
compare with the calculated estimates. As with the section on feature selection, more 
detailed exposition of this task will be included in subsequent reports. 



The Fuzzy C-Shells Algorithm: 

A new Approach 

Raghu Krishnapuram, Olfa Nasraoui, and Hichem Frigui 
Department of Elec trie al and Computer Engineering 
University of Missouri, Columbia, MO 6521 1 

Abstract 

The fuzzy C-S hells (FCS) algorithm is specially designed to search for clusters that can be 
described by circular arcs, or more generally by shells of hyperspheres. In this paper, a new 
approach to the FCS algorithm is presented. This algorithm is computationally and 
implementationally simpler than other clustering algorithms that have been suggested for this 
purpose. An unsupervised algorithm which automatically finds the optimum number of clusters is 
also proposed. This algorithm can be used when the number of clusters is not known, and uses a 
new cluster validity measure. Experimental results on several data sets are presented. 
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1. Introduction 


Many fuzzy (and hard) clustering algorithms have been suggested and used in the literature 
to partition data into clusters. There is a whole class of clustering algorithms in which an objective 
function based on a distance measure is iteratively minimized to obtain the final partition. The 
distance measure chosen and the objective function being optimized depend on the geometric 
structure of the clusters. Different distances have been invented to search for clusters of specific 
shapes in the feature space. For example, the K means algorithm, using the Euclidian distance, 
looks for clusters that are hyperspherical in shape. Until recently it has been difficult to detect 
clusters that can be described by circular arcs, or more generally by shells of hyperspheres. Dave's 
[1,2] Fuzzy C-Shells (FCS) algorithm has proved to be successful in detecting such clusters, and 
several impressive examples involving two-dimensional data sets are given in [1,2], This algorithm 
has also been generalized to the case of elliptical shells [3,4], However the FCS algorithm is 
somewhat implementationally complex since it requires the use of Newton s method to solve two 
coupled nonlinear equations for the center and radius of each cluster in each iteration. Bezdek et al 
have suggested a modification to this algorithm to reduce the computational burden due to the use 
of Newton’s method [5]. In this paper, we propose a new FCS algorithm to overcome this 
problem. Unlike Dave’s method, our method does not involve nonlinear equations. This makes 
our algorithm straightforward, and more importantly, computationally more attracth'e. In addition, 
we also propose an unsupervised algorithm to determine the optimum number of clusters C, when 
this is not known. This unsupervised algorithm involves minimizing a new validity (performance) 
measure called the total average shell thickness. 

In section 2, we present the hard and fuzzy versions of our C-Shells algorithm. In section 
3, we introduce our new cluster validity measure and describe an unsupervised algorithm which 
can be used to determine the optimum number of clusters when this is not known a priori. In 
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section 4, several examples of clustering using the proposed unsupervised algorithm are shown. 
Finally, section 5 gives the summary and conclusions. 

2. The C-Shells Algorithms 


Let Xj be a point in the feature space. We assume that each cluster resembles a 
hyperspherical shell. Therefore, the prototypes A, consist of two parameters (c f , r-), where c- is the 
center of the hypersphere and r t is the radius. We define the distance from x } to a prototype A- = 

(<?/, rj) as 

d tJ 2 = d 2 (x 7 -,A.) = (II Xj - c L II 2 - r ( 2 ) 2 - (!) 

Note that the right hand side of (1), when equated to zero, also gives the equation of the 
hypersphere. In general, the closer x, is to the specific hypersphere, the smaller the distance will 
be. Based on this distance measure, we now define the hard and fuzzy C-Shells algorithms. 


2.1 The C-Shells Algorithm : The Hard Case 


We define the objective function to be minimized in this case, as 

J(L) = £ £ 4 ’ 

z=l xje Ai y 


( 2 ) 


where L = {X x ,...,X K ), and K is the number of clusters. In order to minimize the objective function 
in (2), we rewrite the distance in (1) as 
4 =p?M j p i + v >J Pi +b jt 


where 


T 0 

b i = ( x i X J ) ' 


Mj = fj yj, and 


'j m 2 (*J x j)y, - 




(3) 


Therefore, 



( 4 ) 


'<«=£ ihWth+V- 

We may assume that the vectors p i are independent of each other. Hence, the vectors p i that 


minimize (4) must satisfy 


E (2M p +v ) = 0. 
xje X i J J 


If we define 


H = E, M ; , and w. = E v. 

1 x 7 € A/ j 1 Xjeli j 


from (5) we obtain 

P,-- 5 

The resulting Hard C-Shells (HCS) algorithm is summarized below. 


( 5 ) 


( 6 ) 


( 7 ) 


THE HARD C-SHELLS (HCS) ALGORITHM: 

Fix the number of clusters AT; 

Set iteration counter / = 1 and initialize the hard A'-partition; 
Repeat 

Calculate uf ) and ) for each cluster using (6); 

Compute pV for each cluster using (7); 

2 2 

Classify x- into cluster Xi if d-j < , for all k ^ i; 

Increment /; 

Until ( II p (l ' J ) - p {l) H < £); 


2.2 The C-Shells Algorithm : The Fuzzy case 

For the fuzzy case, we minimize the following objective function: 

1 = i I, jli <'*# r 4 • <8> 

In (8) N is the total number of feature vectors, and U = [ /i- ] is a K x N matrix called the fuzzy 
A'-partition matrix [6] satisfying the following conditions: 
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H. e [0,1] for all i and j, . Z p- = 1 for ally, and 0 < _ S fi.j < N for all i. 

/r.. is the grade of membership of the feature point Xy in cluster X-, and m 6 [1,°°) is a weighting 
exponent called the fuzzifier. As in the hard case, it is easy to show that the vectors p ( that 

minimize (8) are given by (7), where 


0 w,. .£ 


(9) 


and v. and M- are given by (3). Following Bezdek’s theorem for the fuzzy C means [6], it can 
be shown that the memberships will be updated according to 



Pik = 


0 i £ Ik 

1 i e Ik 


if h = ® 


if h*® 
if Ik*® 


( 10 ) 


where /, = [i I 1 < / < K, = 0 }. The resulting Fuzzy C-Shells (FCS) algorithm is summarized 

K 


below. 

THE FUZZY C-SHELLS (FCS) ALGORITHM: 

Fix the number of clusters K\ fix m , 1 < m < 

Set iteration counter / = 1 : 

Initialize the fuzzy A'-parution l/-°\ 

Repeat 

Calculate hA ^ and wP ^ for each cluster A,- using (9); 

Compute pA for each cluster A, using (7); 

Update lA 1 using (10); 

Increment / ; 

Until (II t/ (M) - U in II <e); 

Both the hard and fuzzy C-shells algorithms require the inversion of the matrix H ■ . This is quite 

trivial when the feature space is two-dimensional or three-dimensional. In the hard case, the 
inverse will exist if there are at least n+1 non-collinear points in each cluster, where n is the 
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dimensionality of the feature space. In the fuzzy case, theoretically the inverse will always exist as 
long as N > n+1 and the feature vectors are not colliner. 

3. Determination of the Optimal Number of Clusters 


The algorithm discussed in Section 2 assumes that the number of clusters K is known, 
when this is not the case, one method to determine the optimal number of clusters is to perform 
clustering for a range of K values, and pick the K value for which a suitable performance measure 
is minimized ( or maximized ). We define a new performance ( or cluster validity ) measure called 


the total average shell thickness as follows. 


In the hard case, the total average shell thickness is defined as 


(ID 


where N- is the number of points in cluster A; . In the fuzzy case, the total fuzzy average shell 


thickness is defined to be 

A ; 

K X tfjQbj- C ‘ n - r «-) 2 

w = £ ‘-^—s 

i/i? 

j= 1 (12) 

Thus, to find the optimum number of clusters, one can start with K = 1, and keep 
incrementing K while calculating T(K) after each run of the FCS algorithm, and stop as soon as a 
local minimum of T(K) is found (or K reaches K max ). However, this simple method sometimes 
finds a solution in which some of the circular or hyperspherical shells are split into two or more 
subclusters (usually when K is larger than the actual number of clusters). Therefore, merging back 
all compatible clusters into one cluster is necessary. Two clusters X/ and Xj are considered 

compatible if 

llc.-c.ll<e, and II r. - r. II < £, (13) 

i j l ' ; * 

When minimizing the validity measure for a range of K values, sometimes the algorithm 
finds a few small spurious clusters. These spurious clusters frequently arise due to noise points. 


6 



Such tiny clusters are not compatible with any of the rest of the clusters, and hence merging cannot 
correct this problem. To eliminate such spurious clusters which contain too few points, we just 
discard the prototypes for the small clusters, and rerun the algorithm (after decrementing K by the 
number of tiny clusters), using the remaining prototypes as the initial guesses (i. e., skip the first 
two steps inside the Repeat loop of the C-shells algorithms). This forces the points belonging to 
the spurious clusters to be reassigned to the best-fitting clusters. This procedure is repeated until no 
more elimination takes place. The unsupervised algorithm that finds the optimum number of 
clusters taking into account the problems mentioned above, is summarized below. 

THE UNSUPERVISED C-SHELLS ALGORITHM: 

Set K = 1; fix m , 1 < m < 

local jnin = false; 

While K < Kmax and local jnin = false do 

Initialize the fuzzy A'- partition t/°^; 

Perform the C-Shells algorithm with the number of clusters = K ; 

Store the final K prototypes; 

Calculate T(K) as given by (11) or (12); 

If T(K- 1) is a significant local minimum Then 
local jnin = true; 

K _optimal - K- 1; 

Else 

K = K+ 1; 

End If 

End While 

Merge compatible prototypes among the Koptimal prototypes and update K_optimal; 

Update U using new prototypes and (10) 

Do 

Eliminate tiny clusters and decrement K optimal accordingly; 

Perform the C-Shells algorithm with the new K optimal; 

Until No More Elimination Takes Place 
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4. Experimental Results 


Although the algorithms presented in the previous sections are applicable to feature spaces 
of any dimension, we present only results of two-dimensional data sets here. We found that the 
HCS algorithm is much faster than the FCS algorithm, but performs well only when the data set is 
"clean". This is because the HCS algorithm has a higher tendency to get stuck in local minima, and 
sometimes it terminates abruptly due to the occurrence of singular matrices. Therefore, the HCS 
algorithm is not very robust, and we do not present the results of the HCS algorithm in this paper. 

In all the examples shown in this paper, the FCS algorithm was applied with the fuzzifier 
m = 5. Smaller values did not yield good results. This may be because we initialize the fuzzy 
partition matrix U with the fuzzy C means algorithm [6] which does not yield a good partition of 
the clusters, particularly in the case of overlapping or concentric circles. By making the partitioning 
as fuzzy as possible, it is possible to disentangle the overlapping clusters from each other using the 
FCS algorithm. The value of £\ and £2 used was 2 (see Eq.(13)). 

The data sets were artificially generated, and had between 50 and 200 feature points. 
Uniformly distributed noise with an interval of 3 was added to the feature point locations so that 
they do not always lie exactly on the ideal circles. In addition, noise points were added at random 
locations to some of the data sets. Any cluster with less than 5 points was considered a spurious 

cluster. 

The first example consists of two concentric circles contaminated by a few noise points. 
This is an example where conventional clustering methods fail miserably. The unsupervised FCS 
algorithm stops at K = 3, after finding a local minimum in the total fuzzy average shell thickness 
performance measure Tf{K) at K = 2. The plot of TfiK) versus K is shown in Fig. 1(a). In this 
case, no merging or small-cluster elimination was required, and the final result is shown in Fig. 
1(b). The values of Tj(K) beyond K = 3 were obtained by expressly letting the algorithm run, even 
though it actually stops at K = 3. The bold line in Fig. 1(a) depicts the actual running path of the 
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algorithm. It can be seen that there are other local minima at K = 5, 7, and 9. At these values, the 
partition is still acceptable and a final value of K = 2 would have been obtained after merging 
compatible clusters and eliminating tiny clusters. However, the algorithm is designed to stop as 
soon as the first local minimum is detected to eliminate unnecessary running time. As seen in Fig. 
1(b), the two concentric circles are correctly classified, and the noise points are assigned to the 

closest cluster. 

Fig. 1(c) shows Tf(K) versus K for the data set in Fig. 1(d) (“the crying baby ). This is a 
very difficult example because the clusters have wide-ranging radii, and the outer cluster 
completely encloses all the remaining clusters. Thus, a truly global search for clusters is required, 
which can be achieved only by a relatively large m (=5). Again, the bold line in Fig. 1(c), depicts 
the actual running path of the algorithm which stops at K = 8 (as soon as it detects a local minimum 
at K = 7). At this point, the algorithm merges two compatible clusters into one cluster, and 
reclusters the data set using the prototypes obtained after merging as the initial guesses. In this run, 
one tiny cluster is eliminated, and the remaining 5 prototypes are used as the initial values. This 
forces the few points belonging to the tiny cluster to be assigned to the remaining clusters. The 
final result is K = 5, as shown in Fig. 1(d). 

Four more examples are shown in Fig 2. Fig 2(a) shows the result of clustering two 
semicircles contaminated by noise. This example shows that the algorithm is successful even when 
only parts of circles are present. Fig. 2(b) shows the results of clustering three overlapping circles 
contaminated by noise, and Fig. 2(c) shows the clustering of five sparsely sampled overlapping 
circles. These are both very difficult cases, because the circles are truly entangled, and the initial 
partition is quite wrong. Fig. 2(d) shows the result of clustering the face of “Smiley”. The CPU 
time required on a Sun 4 workstation to run the unsupervised algorithm ranged from 8 s to over 
100 s, depending on the complexity of the data set The plain FCS algorithm typically takes only a 

few seconds. 
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5. Conclusions 


In this paper, we introduced a new approach to the Fuzzy C-Shells algorithm, which seeks 
clusters in hyperspherical shells. This algorithm does not involve solving coupled nonlinear 
equations, and hence is implementationally more attractive than other clustering algorithms that 
have been suggested in the literature for this purpose. We also presented an unsupervised C-shells 
algorithm which automatically finds the optimum number of hyperspherical clusters when this 
information is not known. The unsupervised algorithm is based on minimizing a new validity 
measure called total average shell thickness. Experimental results on a variety of data sets 
demonstrate that the algorithms are effective. 
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List of Figures 


Figure 1: 

(a) Plot of total fuzzy average shell thickness vs number of clusters for two concentric 
circles, (b) result of the Unsupervised FCS Algorithm, (c) Plot of total fuzzy average shell 
thickness vs number of clusters for “the crying baby", and (d) result of the Unsupervised 

FCS Algorithm. 

Figure 2: 

Results of the Unsupervised FCS Algorithm (a) for two semicircles, (b) for three 
overlapping circles, (c) for five sparsely sampled circles, and (d) for “Smiley". 
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